Statistical Approach With Factored Translation Models For Indian Languages
نویسنده
چکیده
Factored translation models are an extension to phrase based statistical translation models which integrate additional annotation at word level. Here we present a study of statistical models and approaches to translate Hindi to English. Experiments were also conducted on alignment models using various word groupings and using GIZA++ to predict their English translations and fertility. TAJ A new word alignment model was developed for Hindi to English. It used a bootstrapping mechanism and a Expectation maximization algorithm along with POS tags to boost probability of words to create a bilingual dictionary. TAJ is compared with GIZA++ for its accuracy on a parallel
منابع مشابه
English-Latvian SMT: knowledge or data?
In cases when phrase-based statistical machine translation (SMT) is applied to languages with rather free word order and rich morphology, translated texts often are not fluent due to misused inflectional forms and wrong word order between phrases or even inside the phrase. One of possible solutions how to improve translation quality is to apply factored models. The paper presents work on Englis...
متن کاملStatistical Machine Translation for Indian Languages: Mission Hindi
This paper discusses Centre for Development of Advanced Computing Mumbai’s (CDACM) submission to the NLP Tools Contest on Statistical Machine Translation in Indian Languages (ILSMT) 2014 (collocated with ICON 2014). The objective of the contest was to explore the effectiveness of Statistical Machine Translation (SMT) for Indian language to Indian language and English-Hindi machine translation. ...
متن کاملStatistical Translation Models: A Literature Survey
In this survey, we briefly study Phrase-based, Factored and Hierarchical translation models. First we learn basics of Phrase-based model. Then we get introduced to an interesting SMT approach called Factored translation models. We also study mathematical modeling of the Factored models. Finally, we compare Factored models with Phrase-based models and know their disadvantages which are pulling t...
متن کاملImproving SMT for Baltic Languages with Factored Models
This paper reports on implementation and evaluation of English-Latvian and Lithuanian-English statistical machine translation systems. It also gives brief introduction of project scope – Baltic languages, prior implementations of MT and evaluation of MT systems. In this paper we report on results of both automatic and human evaluation. Results of human evaluation show that factored SMT gives si...
متن کاملThe Transition of Phrase based to Factored based Translation for Tamil language in SMT Systems
Machine translation is one of the major and the most active areas of Natural language processing. Machine translation (MT) is an automatic translation of one natural language into another using computer generated instructions. The utility and power of Statistical Machine Translation (SMT) seems destined to change our technological society in profound and fundamental ways. The current state-of-t...
متن کامل